Automatic identification of general and specific sentences by leveraging discourse annotations

نویسندگان

  • Annie Louis
  • Ani Nenkova
چکیده

In this paper, we introduce the task of identifying general and specific sentences in news articles. Given the novelty of the task, we explore the feasibility of using existing annotations of discourse relations as training data for a general/specific classifier. The classifier relies on several classes of features that capture lexical and syntactic information, as well as word specificity and polarity. We also validate our results on sentences that were directly judged by multiple annotators to be general or specific. We analyze the annotator agreement on specificity judgements and study the strengths and robustness of features. We also provide a task-based evaluation of our classifier on general and specific summaries written by people. Here we show that the specificity levels predicted by our classifier correlates with the intuitive judgement of specificity employed by people for creating these summaries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

General Versus Specific Sentences: Automatic Identification and Application to Analysis of News Summaries

In this paper, we introduce the task of identifying general and specific sentences in news articles. Instead of embarking on a new annotation effort to obtain data for the task, we explore the possibility of leveraging existing large corpora annotated with discourse information to train a classifier. We introduce several classes of features that capture lexical and syntactic information, as wel...

متن کامل

Discourse Mode Identification in Essays

Discourse modes play an important role in writing composition and evaluation. This paper presents a study on the manual and automatic identification of narration, exposition, description, argument and emotion expressing sentences in narrative essays. We annotate a corpus to study the characteristics of discourse modes and describe a neural sequence labeling model for identification. Evaluation ...

متن کامل

The automatic identification of discourse units in Dutch text

The identification of discourse units is an essential step in discourse parsing, the automatic construction of a discourse structure from a text. We present a rule-based algorithm to identify elementary discourse units (EDUs) in Dutch written text. Contrary to approaches that focus on the determination of segment boundaries, we identify complete discourse units, which is especially helpful for ...

متن کامل

Machine Comprehension with Discourse Relations

This paper proposes a novel approach for incorporating discourse information into machine comprehension applications. Traditionally, such information is computed using off-the-shelf discourse analyzers. This design provides limited opportunities for guiding the discourse parser based on the requirements of the target task. In contrast, our model induces relations between sentences while optimiz...

متن کامل

Mining the Web for Discourse Markers

This paper proposes a methodology for obtaining sentences containing discourse markers from the World Wide Web. The proposed methodology is particularly suitable for collecting large numbers of discourse marker tokens. It relies on the automatic identification of discourse markers, and we show that this can be done with an accuracy within 9% of that of human performance. We also show that the d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011